Checker Design Doc 1: Kick-Off

Pain points in Meridian architecture

  1. Manually releasing rewards is tedious, and will be more so, once we have multiple subnets
  1. All measurements need to fit into spark-evaluate process memory
  1. Measurements are serialized as JSON, which is wasteful. A more dense format like CSV can save a lot of bandwidth & storage space, improve download speeds and reduce our Storacha bill.
  1. We are not committing the evaluated measurements on the chain. Therefore, most data aggregation must happen inside spark-evaluate. As a result, we have two GH repositories sharing the same database - spark-evaluate writes the data & manages the DB schema, spark-stats reads the data.
  1. The process for recording & publishing measurements uses Postgres as a persisted queue. We write 200k measurements every 20 minutes and delete them soon after they are recorded. This creates a lot of WAL (write-ahead-log) entries, we are seeing warnings about writing too much to WAL, Miroslav also suspect this is the reason why a failover of the primary server to the replica takes many minutes to complete.